A model of the regularities underlying speaker variation: evidence from hybrid synthesis

نویسنده

  • Susan R. Hertz
چکیده

This paper presents the framework of a speech model, tentatively called the “hybrid model,” which offers an explanation of how listeners can identify phonemes in an incoming speech signal despite the vast amount of cross-speaker and contextual variation. Fundamental to the model are two basic speech units into which listeners process the incoming speech stream: acoustic consonant clusters and acoustic nuclei. Acoustic nuclei are responsible for speaker identity, but acoustic consonant clusters are more generic and can even be substituted across speakers with negligible impact on speech quality. The paper focuses on acoustic consonant clusters, showing that much of the variability in them is perceptually irrelevant, and how the hybrid model accounts for listeners’ ability to parse them into phonemes. The paper supports the model as applied to English by drawing on experiments in hybrid synthesis, a technique in which speech is produced by splicing together segments from different speakers (natural or synthetic) [1].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

شبکه عصبی پیچشی با پنجره‌های قابل تطبیق برای بازشناسی گفتار

Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...

متن کامل

Speaker adaptation by modeling the speaker variation in a continuous speech recognition system

A method for unsupervised instantaneous speaker adaptation is presented and evaluated on a continuous speech recognition task in a man-machine dialogue system. The method is based on modeling of the systematic speaker variation. The variation is modeled by a low-dimensional speaker space and the classification of speech segments is conditioned by the position in the speaker space. Because the e...

متن کامل

Speaker idiosyncrasy on phonetic regularities in function of temporal parameters of voice

Sex, age, emotional state of the subject, pathological voices ... are characteristics than allow us recognise different speakers. The accent difference persons of different regions, indeed their tone establish inequality between analogous voices; intensity and timbre differ voices with similar tones; pauses, tempo, idea's speed ... condition an unique and discriminate voice for each speaker. Pr...

متن کامل

Phonetic Variation Analysis Via Multi-Factor Sparse Plus Low Rank Language Model

Phonetic transcriptions contain rich information about language. First, the sequential patterns in phonetic transcripts reveal information about the language’s phonotactics. When combined with lexical information, this can help to grow or correct pronunciation dictionaries and to improve grapheme-to-phoneme prediction. Second, the places where pronunciations deviate from the norm can be equally...

متن کامل

Peeling the Onion: A Textual CDA of Research Articles in Humanities and Basic Sciences

This study aimed to investigate the disciplinary and cross-disciplinary variations of research article Introduction sections in 2 disciplines (i.e., humanities and basic sciences). Ninety research article Introduction sections (i.e., 15 from each discipline of applied linguistics, sociology, psychology, biology, agriculture, and geology) were examined. The study was conducted with reference to ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006